Adding New Words into a Language Model using Parameters of Known Words with Similar Behavior
نویسندگان
چکیده
منابع مشابه
A MODEL FOR EVOLUTIONARY DYNAMICS OF WORDS IN A LANGUAGE
Human language, over its evolutionary history, has emerged as one of the fundamental defining characteristic of the modern man. However, this milestone evolutionary process through natural selection has not left any ’linguistic fossils’ that may enable us to trace back the actual course of development of language and its establishment in human societies. Lacking analytical tools to fathom the cr...
متن کاملa model for evolutionary dynamics of words in a language
human language, over its evolutionary history, has emerged as one of the fundamental defining characteristic of the modern man. however, this milestone evolutionary process through natural selection has not left any ’linguistic fossils’ that may enable us to trace back the actual course of development of language and its establishment in human societies. lacking analytical tools to fathom the cr...
متن کاملIdentifying Similar Words and Contexts in Natural Language with SenseClusters
SenseClusters is a freely available intelligent system that clusters together similar contexts in natural language text. Thereafter it assigns identifying labels to these clusters based on their content. It is a purely unsupervised approach that is language independent, and uses no knowledge other than what is available in raw un-annotated corpora. In addition to clustering similar contexts, it...
متن کاملSegmenting DNA sequence into 'words' based on statistical language model
[Abstract] This paper presents a novel method to segment/decode DNA sequences based on n-gram statistical language model. Firstly, we find the length of most DNA “words” is 12 to 15 bps by analyzing the genomes of 12 model species. The bound of language entropy of DNA sequence is about 1.5674 bits. After building an n-gram biology languages model, we design an unsupervised ‘probability approach...
متن کاملThe latent words language model
Statistical language models have found many applications in information retrieval since their introduction almost three decades ago. Currently the most popular models are n-gram models, which are known to suffer from serious sparseness issues, which is a result of the large vocabulary size |V | of any given corpus and of the exponential nature of n-grams, where potentially |V | n-grams can occu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2018
ISSN: 1877-0509
DOI: 10.1016/j.procs.2018.03.003